fix(addie): backfill owner test history + stop dual-write for owner runs by EmmaLouise2018 · Pull Request #4264 · adcontextprotocol/adcp

EmmaLouise2018 · 2026-05-08T20:24:58Z

PR 3 of the #4247 unification stack. Stacked on #4263 → #4250.

Summary

Two coupled changes that close the dual-write bug #4247 is targeting:

Backfill historical owner-triggered tests into canonical state. Migration 472_backfill_agent_test_history_to_compliance_runs.sql copies every `agent_test_history` row with `user_id IS NOT NULL` into `agent_compliance_runs` as `triggered_by = 'owner_test'`, carrying the source row id in `observations_json.backfill_source_id` for idempotent re-runs (`WHERE NOT EXISTS` guard). Each agent's most-recent backfilled row also upserts into `agent_compliance_status` — last-write-wins on `(agent_url)` per the contract pinned in #4250's race test — so the dashboard's compliance tile immediately reflects a real verdict for any agent that was tested through Addie pre-PR-#4250 and never ran the heartbeat.

Stop the dual write for owner runs. `evaluate_agent_quality` no longer calls `agentContextDb.recordTest()` when the caller owns the agent — that's exactly the dual-write that #4247 is closing. The legacy `recordTest` call is retained ONLY for third-party runs so a stranger who tests someone else's agent still has a session-scoped audit trail in their own `agent_test_history`. Owner-triggered runs persist exclusively to canonical state going forward.

Why

PR #4250 made owner runs write to the canonical tables but kept the legacy `recordTest` write for backward compatibility. Two writers per owner run → drift surface. This PR removes the legacy write for owner runs and backfills the rows that were already written through the legacy path so we don't lose history.

Brian's bar: don't lose data silently. Migration 472's idempotent guard + `backfill_source_id` provenance means a re-run is a no-op and any future audit can trace back to the original `agent_test_history` row.

Out of scope (deferred)

Drop `agent_test_history`. Gated on:
- 14-day soak from fix(addie): owner evaluate_agent_quality writes to canonical compliance state #4250 deploy
- 7-day soak from feat(dashboard): surface verdict_source + per-run triggered_by badge #4263 deploy
- S3 cold-storage export of `user_id IS NULL` rows (the third-party / scheduled / unattributed cohort) so audit trail isn't lost
- Row-count delta verification on staging
  Migration 472 documents this in its trailing comment.
Collapse `agent_contexts.last_test_*` into a derived view — PR 4 of the Unify compliance state: every storyboard run writes to one canonical path (heartbeat + Addie + dashboard tests) #4247 stack.

Stacked on

feat(dashboard): surface verdict_source + per-run triggered_by badge #4263 (PR 2 of Unify compliance state: every storyboard run writes to one canonical path (heartbeat + Addie + dashboard tests) #4247) — dashboard surfaces verdict_source + triggered_by badge
fix(addie): owner evaluate_agent_quality writes to canonical compliance state #4250 (PR 1 of Unify compliance state: every storyboard run writes to one canonical path (heartbeat + Addie + dashboard tests) #4247) — owner evaluate_agent_quality writes to canonical compliance state

Merge order: #4250 → #4263 → this PR.

Test plan

`tsc --noEmit -p server/tsconfig.json` clean
Migration is idempotent — re-running on a database with a partial backfill is a no-op via the `WHERE NOT EXISTS` guard on `backfill_source_id`
Staging dry-run: count owner-triggered `agent_test_history` rows pre-migration vs `agent_compliance_runs` rows added — should match exactly
Smoke after deploy: run `evaluate_agent_quality` as owner, confirm `agent_compliance_runs` gets a `triggered_by='owner_test'` row AND `agent_test_history` does NOT get a new row
Smoke after deploy: third-party runs still write to `agent_test_history` (session audit trail preserved)
`/dashboard/agents` history panel shows historical "Your test" badges from the backfill (proves the migration landed and the rendered `triggered_by` is correct)

…ce state Closes the 12-hour gap between owner-triggered storyboard runs and the public /api/registry/agents/:url/compliance endpoint (issue #4247, PR 1 of 4). When evaluate_agent_quality is triggered by the agent owner, the result is now written to agent_compliance_status + agent_compliance_runs + agent_storyboard_status with triggered_by = 'owner_test'. Non-owner runs continue writing to agent_test_history (deprecated in PR 3). Migration 471 adds 'owner_test' to both triggered_by CHECK constraints. notifyComplianceChange is intentionally suppressed for owner runs to prevent iteration-loop Slack spam. https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb

Build-generated output produced by npm run build; matches the tracking pattern of member-agents-openapi.js and registry.js already in dist/schemas/. https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb

@EmmaLouise2018

…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.

PR 3 of the #4247 unification stack. Migration 472 backfills every agent_test_history row with a user_id into agent_compliance_runs as triggered_by='owner_test', carrying the source id in observations_json.backfill_source_id for idempotent re-runs. Each agent's latest backfilled row upserts into agent_compliance_status so the dashboard immediately reflects a real verdict for agents tested through Addie pre-PR-#4250. evaluate_agent_quality stops calling recordTest() when the caller owns the agent — that was the dual-write that #4247 is closing. recordTest is retained ONLY for third-party runs so strangers testing someone else's agent still have a session-scoped audit trail. Drop of agent_test_history table is deferred behind the 14-day soak from #4250 + 7-day soak from #4263 + S3 cold-storage export of non-owner rows. Migration 472 documents this in its trailing comment. Stacked on #4263 → #4250.

bokelley · 2026-05-09T00:02:34Z

Code review (expert pass): block — three issues that need addressing before this lands.

1. Backfill is not chunked or transactional safety-netted.
server/src/db/migrations/472_backfill_agent_test_history_to_compliance_runs.sql:31-58 is a single bulk INSERT … SELECT. On any prod-sized agent_test_history, this holds locks the whole time and is a single point of failure. Per repo convention (see feedback_prod_runnable_scripts_path.md precedent), prod backfill belongs in server/src/scripts/ with chunked windows + dry-run, not a release-command migration. At minimum, chunk by started_at window inside a DO $$ block.

2. Status-priority contract drift between PR and #4250's test.
Migration 472:80-122 implements first-newer-wins via last_checked_at < EXCLUDED.last_checked_at guards on the upsert. PR #4250's compliance-db-last-write-wins.test.ts asserts plain last-write-wins. The PR body says "heartbeat always wins on freshness" — pick one. The test in #4250 will pass at #4250 time and start failing the contract this migration enforces. Reconcile before merge.

3. Partial dual-write window is unguarded.
Between #4250 deploy (canonical writes start) and this PR's deploy (backfill runs), owner runs write to BOTH canonical and agent_test_history. The migration's WHERE NOT EXISTS … backfill_source_id guard correctly skips already-backfilled rows — but owner runs in that window have no backfill_source_id linkage, so they'll be inserted twice into agent_compliance_runs (once by #4250's live write path, once by the backfill).

Add a guard: AND ath.started_at < (SELECT MIN(tested_at) FROM agent_compliance_runs WHERE triggered_by='owner_test') or equivalent — only backfill rows from before the canonical-write cutover.

#4263 is clean and #4268 is solid; this PR is the chain blocker.

claude and others added 5 commits May 8, 2026 19:00

chore(dist): add generated onboarding-openapi schema files

7059147

Build-generated output produced by npm run build; matches the tracking pattern of member-agents-openapi.js and registry.js already in dist/schemas/. https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb

This was referenced May 8, 2026

feat(compliance): derive agent_context.last_test_* from canonical runs #4268

Open

chore(compliance): drop agent_test_history + last_test_* — final cleanup #4274

Open

bokelley mentioned this pull request May 9, 2026

fix(addie): owner evaluate_agent_quality writes to canonical compliance state #4250

Draft

bokelley mentioned this pull request May 9, 2026

feat(dashboard): surface verdict_source + per-run triggered_by badge #4263

Open

5 tasks

EmmaLouise2018 force-pushed the EmmaLouise2018/unification-pr2-dashboard-triggered-by branch from 47b26d4 to fd7d406 Compare May 9, 2026 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(addie): backfill owner test history + stop dual-write for owner runs#4264

fix(addie): backfill owner test history + stop dual-write for owner runs#4264
EmmaLouise2018 wants to merge 5 commits intoEmmaLouise2018/unification-pr2-dashboard-triggered-byfrom
EmmaLouise2018/unification-pr3-backfill-and-drop-test-history

EmmaLouise2018 commented May 8, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EmmaLouise2018 commented May 8, 2026

Summary

Why

Out of scope (deferred)

Stacked on

Test plan

Uh oh!

bokelley commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants